[fud2] update path finding to support multi state input and output #2134

jku20 · 2024-06-11T23:43:15Z

This PR begins progress towards addressing #1958. It updates the algorithm used by fud2 for finding paths from input files to output files to take in operations which have many inputs and outputs.

The behavior concurs with the current fud2 in some but not all cases and does not maintain fud2's always finding the shortest path of operations.

Specifying operations themselves and emitting the Ninja file still assume single state inputs and outputs. This will be changed in further PRs.

sampsyo · 2024-06-12T21:03:39Z

Awesome! I think this is headed in the right direction as a first step. We obviously are already discussing more sophisticated approaches here in #2113, but I guess I am still optimistic that your approach here can work as a stepping stone?

The behavior concurs with the current fud2 in some but not all cases and does not maintain fud2's always finding the shortest path of operations.

Sorry to be somewhat slow on this, but I'm having trouble crystallizing the exact cases where this is true. Would it be possible to jot down a couple of examples (even if they are contrived)? That might help me understand and forget less quickly this time. 🤪

Specifying operations themselves and emitting the Ninja file still assume single state inputs and outputs. This will be changed in further PRs.

Great idea to defer this to a future change.

jku20 · 2024-06-13T00:30:21Z

oops, probably should have given an example.

One example where behavior disagrees is on a state with a single op from itself to itself. The request asks to go from the state to itself through the given op.

The old algorithm would handle this perfectly fine. The new algorithm as implemented would say no path found because it cannot have two files of the same (initial) state. (Sidenote: the specific cased used, self loops with one input and output, can be hacked in as a special case to fix this, but I wouldn't call having a bunch of special cases like this flexible or "good")

Another example is one state with two paths to another state. One path is 2 ops long and another path is 1 op long. Assume a request from the first state to the latter and no --through argument.

The old algorithm would always pick the 1 op long path as it is shorter. The new algorithm however may pick the 2 op long path as it searches through ops in an arbitrary order.

sampsyo

Awesome! Thanks for your patience here—this took me a bit longer to review than I expected. But there is a lot of good stuff in here and I had a good time reading it over!

To summarize my view on this PR, I think it contains 3 main things:

The general scaffolding to support multi-input/-output ops everywhere (i.e., the driver data structure is now a hypergraph).
The new plan-search algorithm using exhaustive enumeration.
A new approach to filename generation that is suited to our now more-complex plans.

Here is my current status on each element after spending some time with this PR:

I understand the scaffolding stuff completely, and I left detailed comments that are mostly low-level code/docs stuff. Looking really good in general!!
I do not really understand the new search algorithm yet, even though it's actually not very much code. The combination of recursion and mutation was too much for me, at least in this review session and with this level of documentation. This is some pretty knotty stuff that I may need your help to fully grok.
I mostly understand the name-generation approach, but not 100%. The issue is that this seemingly-simple mechanism has suddenly gotten kinda fundamentally complicated in a way that may need further care.

I definitely don't want to hold up item 1, which is pretty close to be good to go, while I continue to wrestle with item 2 (and to some degree item 3). So I have an idea about how to proceed strategically:

First, what if we remove the new search algorithm (item 2) from this PR and return to the old DFS thing? We would just slip a couple of assert!s or whatever into the DFS-based code so we panic whenever we encounter any op with more than 1 input or output. But at least the functionality would remain identical and we could get the rest of this PR merged.
- There is probably no need to similarly separate out the name-generation stuff into a different PR, but we could do it if we really want.
Second, we do what we've discussed a bit on Zulip/in person and separate out the search algorithm into a separate module with a minimal interface so it is easily replaceable.
Third, and finally, we reintroduce your enumerative search algorithm alongside the current DFS-based search. This will make it possible to do a thing you mentioned at one point: cross-validate the two implementations against each other, either based on a compile-time option or even a very funky CLI flag.

This strategy would not only let me more properly review the exhaustive search algorithm without holding up the "item 1" refactoring; it would also mean that we would just matter less in general if search algorithms are kinda hard to understand while they are works in progress. What do you think; do you think breaking things down in this way would be feasible?

Also, to call out one high-level suggestion from the comments: would it simplify things if we assumed there was only 1 stdin and 1 stdout file per Request? This seems certain to be true from a UI perspective, and it could alleviate some knottiness at various points.

sampsyo · 2024-06-12T20:59:07Z

fud2/fud-core/src/exec/driver.rs

-    /// Find a chain of Operations from the `start` state to the `end`, which may be a state or the
-    /// final operation in the chain.
-    fn find_path_segment(
+    /// Return parents ops of a given state


The comment (and possibly name) could be a little clearer: find the set of ops that have this state as an output.

fud2/fud-core/src/cli.rs

fud2/fud-core/src/exec/driver.rs

jku20 · 2024-07-01T21:45:02Z

Thanks for the thorough review!

The steps you describe make sense. Currently working on the "scaffolding" part of the PR.

On your suggestion: I don't think having only multiples files as input from stdin or multiple as output to stdout is a big source of difficulty or really much different from a single in and a single out. It's just a bit of a poor implementation right now which I need to change. At a very high level, the problem feels like goes into and comes out of gen_name is confused. I think rewriting/rethinking and documenting it and some of the surrounding code (e.g. the IO enum) will help a lot of this.

jku20 · 2024-07-03T05:49:47Z

The new commits do the following:

refactor gen_name as that was particularly poorly implemented.
move find_path from driver into path
- add flag to choose between new and old implementations
- reify find_path as trait objects stored in Request and chose objects based on the cli (I think that's the right way to describe the implementation?)
improve documentation
undo bullet 2 by replacing the new find_path implementation with a todo!()
- remove tests for the new find_path algorithm

Question: As mentioned in the final bullet, I've currently settled on replacing the find_path function in EnumeratePathFinder with a todo!() and copying the old code over in a new PR when that happens. This is to separate the new path finding algorithm into a new PR. Is there a better way to do this?

sampsyo · 2024-07-04T01:05:20Z

Awesome; I will attempt to take a look as soon as I can!

Question: As mentioned in the final bullet, I've currently settled on replacing the find_path function in EnumeratePathFinder with a todo!() and copying the old code over in a new PR when that happens. This is to separate the new path finding algorithm into a new PR. Is there a better way to do this?

That seems like a great tactic to me!! Yay! I see no problems with leaving a todo!() in place, since it doesn't seem to break the existing codepath.

sampsyo

Wahoo! This looks fantastic! I only found a few super minor things to comment on. Feel free to hit the big green button whenever you want, after either incorporating or ignoring my comments. 😃

Next, I suppose we can do a separate PR with the enumerative search thingy?

fud2/fud-core/src/cli.rs

fud2/fud-core/src/exec/path.rs

fud2/fud-core/src/exec/request.rs

fud2/fud-core/src/run.rs

sampsyo · 2024-07-06T15:22:42Z

fud2/fud-core/tests/tests.rs

Perhaps this empty file should be deleted?

fud2/fud-core/src/exec/driver.rs

jku20 added 4 commits June 7, 2024 13:48

generalize op type

59e63b3

output used output states with operation

2641b54

generalize find_path for many inputs and outputs

a3b9db2

cargo fmt

66a3f15

jku20 marked this pull request as draft June 12, 2024 00:31

jku20 added 4 commits June 13, 2024 14:18

make op graph use adjacency lists and algo change

d9a3fb3

fix find_path algorithm

b1d83b8

remove debug print statements

7d789ab

remove extra parenthesis

edbafaf

jku20 marked this pull request as ready for review June 19, 2024 04:28

jku20 added 13 commits June 19, 2024 01:57

stop unreachable ops being considered for plans

d4b84ce

making file naming not depend on stems and paths

75d6d85

generalize emit logic

1bf07a2

implement requests for multiple inputs and outputs

d36f79d

cargo fmt

a035a94

use simple enumerative search for find_path

4090f6e

cargo clippy

252e898

Merge branch 'main' into fud2-multi-input-output-find-path

2967705

fix off by one

7a5d63a

modify cli to support multiple inputs and outputs

de0dbb5

fix name generation on cyclic plans

9fcfab7

remove intermediate file dedup

58b402d

remove commented out prints

bd596e0

jku20 requested a review from sampsyo June 28, 2024 07:16

sampsyo reviewed Jun 29, 2024

View reviewed changes

refactors for readability

acdbf4e

document IO

574e2d6

jku20 added 13 commits July 1, 2024 22:20

fix typo

e711f81

clarify iteration using any

47699b8

fix documentation

29820aa

refactor and document gen_name

20e3fb4

restore relative generated file path

8d557f5

refactor gen_name and IO

afe6194

factor out path find code and use enumerate search

d11dd29

convert default impl to macro

fbfba5e

add option to choose between old and new plan algo

3b5afb7

simplify and document enumeration find_path algo

4a7fb63

documentation improvements

cf3526f

document FindPath

386948d

remove new find_path algorithm

3c28fcf

jku20 requested a review from sampsyo July 3, 2024 06:07

sampsyo reviewed Jul 6, 2024

View reviewed changes

jku20 added 2 commits July 8, 2024 11:09

rename path to plan and fix typos

c71a6b7

more typos

26b59ae

jku20 merged commit 259c2c8 into main Jul 8, 2024
18 checks passed

jku20 deleted the fud2-multi-input-output-find-path branch July 8, 2024 16:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fud2] update path finding to support multi state input and output #2134

[fud2] update path finding to support multi state input and output #2134

jku20 commented Jun 11, 2024

sampsyo commented Jun 12, 2024

jku20 commented Jun 13, 2024

sampsyo left a comment

sampsyo Jun 12, 2024

jku20 commented Jul 1, 2024

jku20 commented Jul 3, 2024

sampsyo commented Jul 4, 2024

sampsyo left a comment

sampsyo Jul 6, 2024

[fud2] update path finding to support multi state input and output #2134

[fud2] update path finding to support multi state input and output #2134

Conversation

jku20 commented Jun 11, 2024

sampsyo commented Jun 12, 2024

jku20 commented Jun 13, 2024

sampsyo left a comment

Choose a reason for hiding this comment

sampsyo Jun 12, 2024

Choose a reason for hiding this comment

jku20 commented Jul 1, 2024

jku20 commented Jul 3, 2024

sampsyo commented Jul 4, 2024

sampsyo left a comment

Choose a reason for hiding this comment

sampsyo Jul 6, 2024

Choose a reason for hiding this comment